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1. INTRODUCTION 

Classification is an important function in data mining. One of the main issues in performing 
classification is to identify the classifier in order to obtain good classification accuracy. The use of a single 
classifier provides minimal exploitation of complementary information from other classifiers, while the 
combination of multiple classifiers may provide such additional information [1]. The goal of multiple 
classifier combination is to obtain a comprehensive result by combining the outputs of several individual 
classifiers [2]. This consists of a set of classifiers called classifier ensemble and a combination strategy for 
integrating classifier outputs called combiner. 

Multiple classifier combination has been widely used in many application domains such as: speech 
recognition [3], human emotion recognition [4], video classification [5], face recognition [6], email 
classification [7], cancer classification [8], plant leaf identification [9], concept drift identification [10] and 
sukuk rating prediction [11]. Multiple classifier combination has been very useful in enhancing the 
performance of classification. However, there are two problems in developing multiple classifier 
combinations: constructing the classifier ensemble; and, constructing the combiner. There are no standard 
guidelines concerning how to construct a set of diverse and accurate classifiers and how to combine the 
classifier outputs [12]. Most previous studies focus on classifier ensemble construction and apply a simple 
fixed combiner to combine the outputs [13]. This study focused on both problems and reviews were 
performed on feature set partitioning and weighted voting combiner. 

There are several approaches to construct a classifier ensemble. All such approaches attempt to 
generate diversity by creating classifiers that make errors on different patterns, thus they can be combined 
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effectively. The diversity among classifiers in ensemble is deemed to be a key success factor when to 
constructing classifier ensemble. Theoretically and empirically, it has been shown that a good ensemble has 
both accuracy and diversity [14]. One of the approaches used to construct a classifier ensemble is the feature 
decomposition method which manipulates input features in constructing a diverse classifier ensemble. This 
method decomposes the input features while training the classifier ensemble. Therefore this method is 
appropriate for high dimensionality data sets [15]. 

One of the cases of feature decomposition is feature set partitioning. Input features are randomly 
partitioned to several disjointed subsets. Consequently, each classifier is trained on different subsets. Feature 
set partitioning is appropriate for classification tasks containing large number of features [16], [17]. 
However, it is difficult to determine how to form optimal feature set partition to train classifiers to produce 
good performance. 

Reviews of the set partitioning problem highlight that the ant system, which is a variant of ant 
colony optimization (ACO), is the most promising technique to be applied [18]. The ACO algorithm was 
introduced by Marco Dorigo in the early 1990s. This algorithm is inspired by the behavior of ants in finding 
the shortest path from the colony to the food; in order to find the shortest route they leave a pheromone on 
their tour paths. The ant-based algorithm has shown better performance than other popular heuristics such as 
simulated annealing and genetic algorithms [19]. The ant system (AS) algorithm is a variant of the ant based- 
algorithm. This is an original and most used ant-based algorithm in solving many optimization 
problems [20]. The ant system has also been used to solve the set partitioning problem. Set partitioning 
problems are difficult and very complicated combinatorial issues [21]. The use of ant system for set 
partitioning problem has been applied in constructing a classifier ensemble [22]. 

The most popular, fundamental and straightforward combiner is majority voting [23]. Every 
individual classifier votes for one class label. The class label that most frequently appears in the output of 
individual classifiers is the final output. To avoid the draw problem, the number of classifiers performed for 
voting is usually odd. Majority voting is often used to combine multiple classifiers in order to solve 
classification problems [24]. Previously popular ensemble methods such as bagging, boosting and random 
forest have used majority voting in combining classifier outputs. The advantages of majority voting include 
simplicity and lower computational cost. Majority voting enables combination of the output of classifiers 
regardless of what classifier is used. It is an optimal combiner in several ensemble methods [25]. However, 
the disadvantage of this combiner is that it does not consider the strength of the classifier [26]. 

Weighted voting is a trainable version of majority voting which, unlike majority voting, gives 
weight to each classifier before voting. To make an overall prediction, a weighted vote of the classifier 
predictions is performed to predict the class. There are several ways to determine the weight of 
classifiers [27]. The advantages of weighted voting include its flexibility and the potential to produce 
better performances than majority voting. This combiner has the potential to make multiple classifier 
combinations more robust to the choice of the number of individual classifiers [28]. In addition the 
accuracies of the classifiers can be reliably estimated, after which weighted voting may be considered [29]. 
Several studies have concentrated on weighted voting and have been proven to solve real-world problems 
such as face and voice recognition [30] and listed companies’ financial distress prediction [31]. Therefore, in 
this study the weighted voting combiner is adapted as a combiner which considers the performance of each 
classifier. 


2. RESEARCH METHOD 

There are three steps to the research work: (1) classifier ensemble construction; (2) combiner 
construction; and (3) evaluation. In developing the multiple classifier system, effective combination must 
address the first two steps of ensemble construction and combiner construction. The ant system feature set 
partitioning algorithm is applied to construct classifier ensemble, while the weighted voting technique is 
applied as a combiner. Figure 1 shows the architecture of the proposed method which consists of two 
components namely the ant system feature set partitioning and the weighted voting combiner. 
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Figure 1. Architecture of the proposed method for multiple classifier system 


2.1. Classifier Ensemble Construction 

The classifier ensemble is built based on the feature set partitioning algorithm. A disjoint feature set 
partition is carried out based on the input feature set. An algorithm based on ant system is developed to 
perform feature set partitioning. The number of feature partitions is determined by the number of individual 
classifiers. The required inputs include feature set and category labels of the original data set. The input 
feature set is partitioned into different feature subsets and no feature in the training set is removed. Therefore, 
each individual classifier is trained on a different projection of the training set. The flowchart for feature 
decomposition is depicted in Figure 2. 
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Figure 2. Flowchart of the ant system-based feature set partitioning algorithm 
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2.2. Combiner Construction 

In this construction stage, the weighted voting method is used as the combiner. A learning process 
for each classifier on different partitions of features is performed by the ant system algorithm. Weights are 
given according to the performance of each classifier. The performance of each classifier depends on the 
feature set partition. Therefore, the voting weights of each classifier are updated dynamically based on the 
feature set partition. The idea behind this approach is that the classifier which is trained by different feature 
set partitions will provide different accuracies although one type of classifier is used in the ensemble. 
Classifiers that provide a high accuracy are more likely to classify patterns correctly. Let D = {D,, ..., D,} be 
a set of individual classifiers (or an ensemble of classifiers) where L is the number of individual classifiers. 
Let Q = {wy , W2, W3, ...,@,} be a set of class labels where c is the number of classes. Let T = {x;, yi} be a 
training set (a labelled dataset) where i = 1 ...N, N is the number of instances, x; € R” is the n dimensional 
feature vector of i-th instance and y; E {@, ..., Wc} is the class label of the i-th instance. Each classifier D; 
assigns an input feature vector to one of the predefined class labels, i.e., Dj: R” >N. The output of a 
classifier ensemble is an L dimensional class label vector [D,(x), ...,D,(x)]". The task is to combine L of 
individual classifier outputs to predict the class label from a set of possible class labels that make the best 
classification of the unknown pattern. 

In formulating the weighted voting combiner, let us assume that only the class labels are available 
from the classifier outputs, and define the decision of the j-th classifier as d_G,k)E {0,1}, j=l,...,L and 
k=1,...,C, where L is the number of classifiers and C is the number of classes. If j-th classifier D_j chooses 
class w_k, then d_(j,k)=1 and 0 otherwise. The ensemble decision for the proposed weighted voting can be 
described as follows: choose class œ_(k*) if 


wis accjdj x, (x) = max, ee accjd; x (x) d) 


where acc; is the accuracy (or weight) of classifier D;. The votes are multiplied by a weight before the actual 
voting. The weight is obtained by estimating the classification accuracy on a validation set. 


2.3. Evaluation 

In this step, the performance of multiple classifiers constructed by the proposed ant system and 
weighted voting (ASWV) method is measured and compared with several other ensemble methods. 
Experiments were conducted on 9 (nine) benchmark datasets taken from the University California, Irvine 
(UCI) repository. The k-Nearest Neighbour (k-NN) ensemble has also been used in the experiments. Table 1 
shows a summary of the datasets used in the experiments. 


Table 1. Summary of Datasets 


: Number 
No. Datasets Number ot Nümbeiot of Features Types 
Instances Classes 
Features 
1 Haberman 306 2 3 Integer 
2 Tris 150 3 4 Real 
3 Lenses 24 3 4 Categorical 
4 Liver 345 2 6 Categorical, Integer 
Real 
5 Ecoli 336 8 7 Real 
6 Prima Indians Diabetes 768 2 8 Integer, Real 
J: Tic-Tac-Toe 958 2 9 Categorical 
8 Glass 214 6 9 Real 
Breast Cancer í 
9 (Wisconsin) 699 2 9 Categorical 


The k-fold cross-validation method was applied in the process of obtaining the classification 
accuracy [32]. A set of labeled samples are randomly partitioned into k disjoint folds of equal size. Then, one 
of the k folds is randomly selected as the testing set and the remaining (k-1) folds are selected as the training 
set with the assumption that there is at least one sample per class. The classification accuracy (acc) is the 
ratio of numbers of all correctly classified instances and the total number of instances as shown in 
Equation 2. 


no.of all correctly classified instances 
gee m Or ee sif iea T0008 (2) 


total number of instance 
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Finally, the estimation of classification accuracy is obtained by dividing the total of all classification 
accuracies by the total number of folds or rounds as shown in Equation 3. 


lyk 
ACCoy = z Nini ACC; (3) 


acc; is the classification accuracy of round i and k is the number of folds. A common choice for k- 
fold cross validation is k=10. Extensive experiments have shown that 10 (ten) is the best choice to get an 
accurate estimate [33]-[35]. To obtain powerful performance estimation and comparisons, a large number of 
estimates are always preferred. Therefore, in this research, the experiments are conducted on ten times the 
10-fold cross-validation method. 


3. RESULTS AND ANALYSIS 

The ant system algorithm was used to partition the feature set and weighted voting was used to 
combine classifier outputs. Experiments were carried out on nine (9) data sets from the UCI repository. 
Ten (10) experiments which consist of 10-fold cross validation method were carried out to validate the 
accuracy of single k-NN and constructed k-NN ensembles. Tables 2 shows the average and standard 
deviation of the classification accuracies of single k-NN, constructed k-NN ensembles based on random 
subspace and constructed k-NN ensembles with the used of ant system-based feature set partitioning 
respectively. It can be shown that a small standard deviation was obtained for all method which indicates the 
experiments were stable. The average accuracy of the constructed multiple k-NN by the proposed method 
was compared with the average accuracies of original single k-NN and constructed k-NN ensembles by the 
random subspace method. It can also be seen that the proposed method provides better accuracy than single 
approach and random space method in constructing k-NN ensembles. Improvements in accuracy are obtained 
on all datasets. The comparison of accuracies is as shown in Table 2. 


Table 2. The Accuracy of Single k-NN, Random Subspace and Proposed Method 


Single k-NN k-NN with k-NN with Ant Syastem 

Random Subspace Proposed Method 
No Dataset 

Average panda Average standard Average Standard 

Deviation Deviation Deviation 
1 Haberman 68.83 1.37 67.91 1.96 68.53 0.79 
2 Iris 95.67 0.47 93.40 0.47 96.34 0.35 
3 Lenses 77.92 2.81 62.50 4.17 86.67 1.76 
4 Liver 62.32 1.00 60.06 3.48 65.48 1.35 
5 Ecoli 81.19 0.61 81.19 1.70 81.91 0.31 
6 Pima 67.37 0.81 70.59 1.32 71.22 0.00 
7 Tic-Tac-Toe 75.77 0.45 75.70 2.19 78.81 0.39 
8 Glass 72.71 0.83 72.71 1.86 73.54 0.43 
9 Breast Cancer 95.78 0.28 97.23 0.31 98.09 0.00 


The proposed algorithm was successfully applied to form feature set partition. Table 3 shows the 
summary of the result of implementing this proposed algorithm. This table presents the feature set partition 
and the number of classifiers. 


Table 3. Obtained Feature Set Partition and Number of Classifiers 


No Dataset Partition Number of Classifiers 
1 Haberman [1 3][2] 2 
2 Tris [1234] 1 
3 Lenses [1234] 1 
4 Liver [1 4 6][3 5][2] 3 
5 Ecoli [1234567] 1 
6 Pima [1 3 47][5 6 8][2] 3 
7 Tic-Tac-Toe [123456789] 1 
8 Glass [123456789] 1 
9 Breast Cancer [12 47 9][3 5][6][8] 4 


The accuracy of the proposed method was also compared to the other common methods as shown in 
Table 4. The accuracy of the proposed method was evaluated by comparing the results to: (1) Single 
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classifier approach, (2) dynamic weighted voting [28], (3) improved k-NN classification using genetic 
algorithm (GA k-NN) [36], (4) simultaneous metaheuristic feature selection (SMFS) [37], (5) weighted k-NN 
ensemble method [27], (6) direct boosting algorithm [38], (7) cluster-oriented ensemble classifier (COEC) 
[39] and (8) evidential neural network [40]. The k-NN classifier was used as the base classifier. Based on the 
results, it can be seen that the proposed method gives the best classification accuracies as compared to the 
other methods on habermann and breast cancer dataset. In general, the proposed method gives good 
classification results and is comparable with other methods. 


Table 4. Comparison of Accuracies with Common Ensemble Methods 


Dataset 1 2 3 4 5 6 T 8 9 
Haberman 66.83 - - - 71.89 - - - 72.75 
Iris 95.67 97.33 - - 95.20 96.70 96.00 94,93 96.34 
Ecoli 81.19 - - - 82.89 - - - 81.91 
Glass 72.71 - - - 74.23 72.50 - - 73.54 
Pima 67.37 72.68 - 71.90 - 75.70 - 71.79 71.22 
preat 95.78 96.35 97.92 97.50 - - 97.72 - 98.09 
Cancer 
1. Single k-NN 2. Dynamic weighted voting 3. GAk-NN 
4. SMFS 5. | Weighted k-NN ensemble method 6. Direct boosting algorithm 
7. COEC 8.  Evidential neural network 9. Proposed ASWV method 


4. CONCLUSION 

A new method based on the integration of the ant system and weighted voting for multiple classifier 
systems has been presented. The ant system was applied to optimize the feature set partition activity while 
weighted voting was used as a combiner. Experiment results show that the application of this method in 
combining several k-NNs as base classifier outperforms single k-NN, comparable with other ensemble 
methods. The results indicate that the proposed method can be applied in generating better k-NN ensembles. 
Furthermore, this method can determine the number of the combined classifiers based on the number of 
formed partitions. 

Future research is to apply this method on other classifiers such as the Support Vector Machine, 
Neural Network and Decision Tree. The dynamic feature partition-selection approach can be considered to 
enhance the performance of this method. The method will, hopefully, be able to partition the feature set into 
several lower-dimensional feature sets, which would allow a set of classifiers to process low dimensional 
feature vectors simultaneously. Therefore, testing the ability of this method to overcome the high 
dimensional data and small training sample problems can be considered. 
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